NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TAMPA: interpretable analysis and visualization of metagenomics-based taxon abundance profiles

https://doi.org/10.1093/gigascience/giad008

Sarwal, Varuni; Brito, Jaqueline; Mangul, Serghei; Koslicki, David (December 2022, GigaScience)

Abstract Background Metagenomic taxonomic profiling aims to predict the identity and relative abundance of taxa in a given whole-genome sequencing metagenomic sample. A recent surge in computational methods that aim to accurately estimate taxonomic profiles, called taxonomic profilers, has motivated community-driven efforts to create standardized benchmarking datasets and platforms, standardized taxonomic profile formats, and a benchmarking platform to assess tool performance. While this standardization is essential, there is currently a lack of tools to visualize the standardized output of the many existing taxonomic profilers. Thus, benchmarking studies rely on a single-value metrics to compare performance of tools and compare to benchmarking datasets. This is one of the major problems in analyzing metagenomic profiling data, since single metrics, such as the F1 score, fail to capture the biological differences between the datasets. Findings Here we report the development of TAMPA (Taxonomic metagenome profiling evaluation), a robust and easy-to-use method that allows scientists to easily interpret and interact with taxonomic profiles produced by the many different taxonomic profiler methods beyond the standard metrics used by the scientific community. We demonstrate the unique ability of TAMPA to generate a novel biological hypothesis by highlighting the taxonomic differences between samples otherwise missed by commonly utilized metrics. Conclusion In this study, we show that TAMPA can help visualize the output of taxonomic profilers, enabling biologists to effectively choose the most appropriate profiling method to use on their metagenomics data. TAMPA is available on GitHub, Bioconda, and Galaxy Toolshed at https://github.com/dkoslicki/TAMPA and is released under the MIT license.
more » « less
Full Text Available
pyTCR: A comprehensive and scalable solution for TCR-Seq data analysis to facilitate reproducibility and rigor of immunogenomics research

https://doi.org/10.3389/fimmu.2022.954078

Peng, Kerui; Moore, Jaden; Vahed, Mohammad; Brito, Jaqueline; Kao, Guoyun; Burkhardt, Amanda M.; Alachkar, Houda; Mangul, Serghei (October 2022, Frontiers in Immunology)

T cell receptor (TCR) studies have grown substantially with the advancement in the sequencing techniques of T cell receptor repertoire sequencing (TCR-Seq). The analysis of the TCR-Seq data requires computational skills to run the computational analysis of TCR repertoire tools. However biomedical researchers with limited computational backgrounds face numerous obstacles to properly and efficiently utilizing bioinformatics tools for analyzing TCR-Seq data. Here we report pyTCR, a computational notebook-based solution for comprehensive and scalable TCR-Seq data analysis. Computational notebooks, which combine code, calculations, and visualization, are able to provide users with a high level of flexibility and transparency for the analysis. Additionally, computational notebooks are demonstrated to be user-friendly and suitable for researchers with limited computational skills. Our tool has a rich set of functionalities including various TCR metrics, statistical analysis, and customizable visualizations. The application of pyTCR on large and diverse TCR-Seq datasets will enable the effective analysis of large-scale TCR-Seq data with flexibility, and eventually facilitate new discoveries.
more » « less
Full Text Available
Rigorous benchmarking of T-cell receptor repertoire profiling methods for cancer RNA sequencing

https://doi.org/10.1093/bib/bbad220

Peng, Kerui; Nowicki, Theodore S; Campbell, Katie; Vahed, Mohammad; Peng, Dandan; Meng, Yiting; Nagareddy, Anish; Huang, Yu-Ning; Karlsberg, Aaron; Miller, Zachary; et al (July 2023, Briefings in Bioinformatics)

The ability to identify and track T-cell receptor (TCR) sequences from patient samples is becoming central to the field of cancer research and immunotherapy. Tracking genetically engineered T cells expressing TCRs that target specific tumor antigens is important to determine the persistence of these cells and quantify tumor responses. The available high-throughput method to profile TCR repertoires is generally referred to as TCR sequencing (TCR-Seq). However, the available TCR-Seq data are limited compared with RNA sequencing (RNA-Seq). In this paper, we have benchmarked the ability of RNA-Seq-based methods to profile TCR repertoires by examining 19 bulk RNA-Seq samples across 4 cancer cohorts including both T-cell-rich and T-cell-poor tissue types. We have performed a comprehensive evaluation of the existing RNA-Seq-based repertoire profiling methods using targeted TCR-Seq as the gold standard. We also highlighted scenarios under which the RNA-Seq approach is suitable and can provide comparable accuracy to the TCR-Seq approach. Our results show that RNA-Seq-based methods are able to effectively capture the clonotypes and estimate the diversity of TCR repertoires, as well as provide relative frequencies of clonotypes in T-cell-rich tissues and low-diversity repertoires. However, RNA-Seq-based TCR profiling methods have limited power in T-cell-poor tissues, especially in highly diverse repertoires of T-cell-poor tissues. The results of our benchmarking provide an additional appealing argument to incorporate RNA-Seq into the immune repertoire screening of cancer patients as it offers broader knowledge into the transcriptomic changes that exceed the limited information provided by TCR-Seq.
more » « less
Full Text Available
Benchmarking of computational error-correction methods for next-generation sequencing data

https://doi.org/10.1186/s13059-020-01988-3

Mitchell, Keith; Brito, Jaqueline J.; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S.; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; et al (December 2020, Genome Biology)

Full Text Available
Critical Assessment of Metagenome Interpretation: the second round of challenges

https://doi.org/10.1038/s41592-022-01431-4

Meyer, Fernando; Fritz, Adrian; Deng, Zhi-Luo; Koslicki, David; Lesker, Till Robin; Gurevich, Alexey; Robertson, Gary; Alser, Mohammed; Antipov, Dmitry; Beghini, Francesco; et al (April 2022, Nature Methods)

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
more » « less
Full Text Available

Search for: All records